首页> 外文OA文献 >Term-Weighting Learning via Genetic Programming for Text Classification
【2h】

Term-Weighting Learning via Genetic Programming for Text Classification

机译:基于遗传规划的文本分类术语加权学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper describes a novel approach to learning term-weighting schemes(TWSs) in the context of text classification. In text mining a TWS determinesthe way in which documents will be represented in a vector space model, beforeapplying a classifier. Whereas acceptable performance has been obtained withstandard TWSs (e.g., Boolean and term-frequency schemes), the definition ofTWSs has been traditionally an art. Further, it is still a difficult task todetermine what is the best TWS for a particular problem and it is not clearyet, whether better schemes, than those currently available, can be generatedby combining known TWS. We propose in this article a genetic program that aimsat learning effective TWSs that can improve the performance of current schemesin text classification. The genetic program learns how to combine a set ofbasic units to give rise to discriminative TWSs. We report an extensiveexperimental study comprising data sets from thematic and non-thematic textclassification as well as from image classification. Our study shows thevalidity of the proposed method; in fact, we show that TWSs learned with thegenetic program outperform traditional schemes and other TWSs proposed inrecent works. Further, we show that TWSs learned from a specific domain can beeffectively used for other tasks.
机译:本文介绍了一种在文本分类的背景下学习术语加权方案(TWS)的新颖方法。在文本挖掘中,TWS在应用分类器之前确定在矢量空间模型中表示文档的方式。尽管使用标准交易平台(例如布尔和项频方案)已经获得了可接受的性能,但是交易平台的定义传统上是一门技术。此外,要确定对于特定问题的最佳TWS仍然是一项艰巨的任务,并且尚不清楚是否可以通过组合已知的TWS来生成比当前可用的更好的方案。我们在本文中提出了一个遗传程序,旨在学习有效的TWS,可以提高当前方案在文本分类中的性能。遗传程序学习如何结合一组基本单位来产生有区别的TWS。我们报告了一项广泛的实验研究,包括来自主题和非主题文本分类以及图像分类的数据集。我们的研究表明了该方法的有效性。实际上,我们表明,使用遗传程序学习的交易平台优于传统方案,而其他交易平台建议的近期工作也是如此。此外,我们表明,从特定领域中学到的交易平台可以有效地用于其他任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号